{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 利用机器学习进行手写字体识别的过程解析\n",
    "\n",
    "\n",
    "## 概述\n",
    "\n",
    "本课程作为之数字识别主题教程的第一节课。\n",
    "在这个主题课程里，我们不光用到了opencv还用到了python的一个机器学习的库**sklearn**, 虽说opencv自带机器学习的API，但是对于阿凯来讲sklearn用着毕竟顺手一些。\n",
    "\n",
    "在本节课，旨在给大家一个关于**机器学习流程**的宏观印象。不需要注重算法底层细节，只需走马观花看个大概。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 引入依赖"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cv2\n",
    "import numpy as np\n",
    "from sklearn import datasets,svm\n",
    "from matplotlib import pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据预处理\n",
    "\n",
    "\n",
    "**数据预处理是整个机器学习过程中， 耗费最大时间跟精力的地方， 大概会占据你80%以上的工作量。**\n",
    "\n",
    "\n",
    "就拿数字识别来讲，数据预处理所需要做的事情如下：\n",
    "\n",
    "* **去除背景** 是将带有背景的数字提取出来。也有可能需要自己手动抠。\n",
    "* **图像变换** 纸张可能会有畸变，需要对其进行投影变换。或者2D仿射变换，旋转等。\n",
    "* **图像滤波** 可能会有噪声，图像中夹杂者一些小噪点。需要进行一些数学形态学运算。\n",
    "* **字符分割** 数字与数字之间可能有重叠，也由可能有连接，数据预处理需要将两者分开。\n",
    "* **图像二值化** \n",
    "* **样本标注与配比** 每种数字各选多少个，而且要注意书写风格尽可能不能一样。标注图片与数字之间的对应关系。\n",
    "\n",
    "\n",
    "**字符分割是整个图像预处理中最难的地方** \n",
    "![Screenshot_20180301_154627.png](./image/Screenshot_20180301_154627.png)\n",
    "\n",
    "在课程的后半部分，我会介绍几个经典的字符分割的算法，但是不在此展开。\n",
    "\n",
    "\n",
    "我们还是从简单一些的样例开始。\n",
    "\n",
    "我们在的 **CH5.3 图像2D仿射变换** 这一节中教给大家如何从画布中提取数字区域。\n",
    "\n",
    "![number_minarearect_canvas2.png](./image/number_minarearect_canvas2.png)\n",
    "\n",
    "并最终放缩到统一尺寸，如下图所示。\n",
    "\n",
    "![Screenshot_20180219_171151.png](./image/Screenshot_20180219_171151.png)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 特征提取\n",
    "\n",
    "特征，也称之为描述子，可以代表一个对象的标签。\n",
    "例如描述一个人，我们可以采用这些描述子： 身高，体重，生日....\n",
    "随着描述子数据量的增大，后续的计算量也就会很大，一般来讲需要的样本量也会很大。\n",
    "\n",
    "所以，如何用少量的信息，描述原来数据的大部分特征，这个就是特征提取所要做的事情。 当然数据量的减少，必然带来信息的丢失。\n",
    "\n",
    "3维空间的实物，可以拍成1000×1000像素的照片，通过照片我们可以判断这是个什么物体，而不需要输入整体的三维数据。（**降维**）\n",
    "\n",
    "可能因为内容简单， 转换为8×8的图片你也能看出来这是个啥(**缩放** )，接下来还可能用5个数值或者更少的数值代表这个8×8的矩阵（**PCA主成分分析**， 抓大放小）\n",
    "\n",
    "也可以将这个8×8的图片，变成一个长度为64的1维向量。本次演示中，就是这么做的。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[  7.5     15.9375  15.9375  15.9375  15.9375  15.9375  15.9375  15.9375\n",
      "  14.25    15.9375  15.9375  15.9375  15.9375  15.9375  15.9375  15.9375\n",
      "   0.       0.       0.       1.      15.9375  15.9375  15.9375   4.8125\n",
      "   0.       0.       1.3125  15.9375  15.9375  15.9375   0.       0.       0.\n",
      "   0.       3.      15.9375  15.9375   3.       0.       0.       0.       0.\n",
      "  15.9375  15.9375  15.9375   0.       0.       0.       0.       0.\n",
      "  15.9375  15.9375   3.8125   0.       0.       0.       0.       4.9375\n",
      "  15.9375  15.       0.0625   0.       0.       0.    ]\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPgAAAD8CAYAAABaQGkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACpRJREFUeJzt3V+IHfUZxvHn6UZpt9oNttsim9DkQgJeNbIIkio0YolV\nkl70IgGFSGFvoigtiPaqvfKmiL0oQohawVRpo4KI1QoqqdJaNzFtTTaWNKRkg3YNGv9CQ+rbi52U\nVVPPnJzfnJnz5vuBhZyzw5z3JHwzs7NnZhwRApDTF9oeAEBzCBxIjMCBxAgcSIzAgcQIHEiMwIHE\nCBxIjMCBxJY1sdLx8fGYmJhoYtWfMTU1NZTXAbrkyJEjOn78uHst10jgExMT2rp1axOr/oy77rpr\nKK8DdMn09HSt5dhFBxIjcCAxAgcSI3AgMQIHEiNwIDECBxIjcCCxWoHb3mD7dduHbN/R9FAAyugZ\nuO0xSb+UdK2kSyVtsX1p04MBGFydLfjlkg5FxOGIOCnpEUmbmh0LQAl1Ap+SdHTJ4/nqOQAdV+wg\nm+0Z27O2Zz/66KNSqwUwgDqBH5O0csnjFdVznxAR2yNiOiKmx8fHS80HYAB1An9F0iW2V9s+X9Jm\nSU80OxaAEnqeDx4Rp2zfLOkZSWOS7o+I/Y1PBmBgtS74EBFPSXqq4VkAFMYn2YDECBxIjMCBxAgc\nSIzAgcQIHEiMwIHECBxIrJE7m0xOTmrbtm1NrBpAH9iCA4kROJAYgQOJETiQGIEDiRE4kBiBA4kR\nOJAYgQOJ1bmzyf22F2y/NoyBAJRTZwv+K0kbGp4DQAN6Bh4RuyW9PYRZABTGz+BAYo3cuujtt9ng\nA11QLPClty666KKLSq0WwADYRQcSq/Nrsocl/VHSGtvztn/Y/FgASqhzb7ItwxgEQHnsogOJETiQ\nGIEDiRE4kBiBA4kROJAYgQOJETiQmCOi/Ert8is9By1fvnxor/XOO+8M7bUy27hx41BeZ/fu3Tpx\n4oR7LccWHEiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIjMCBxOpcdHGl7edtH7C93/atwxgM\nwOB6XnRR0ilJP46IvbYvlLTH9rMRcaDh2QAMqM69yd6IiL3Vn9+XNCdpqunBAAyuzhb8f2yvkrRW\n0stn+N6MpJkiUwEoonbgti+Q9Kik2yLivU9/PyK2S9peLcvpokAH1DqKbvs8Lca9MyIea3YkAKXU\nOYpuSfdJmouIu5sfCUApdbbg6yTdKGm97X3V1/cangtAAXXuTfaipJ6XhgHQPXySDUiMwIHECBxI\njMCBxAgcSIzAgcQIHEiMwIHE+jqb7Fw3OTk51NdbWFgY6utltPhJ63MXW3AgMQIHEiNwIDECBxIj\ncCAxAgcSI3AgMQIHEiNwILE6F138ou0/2/5Ldeuinw1jMACDq/NR1X9LWh8RH1SXT37R9u8i4k8N\nzwZgQHUuuhiSPqgenld9cWMDYATUvfHBmO19khYkPRsRZ7x1ke1Z27OlhwRwdry4ga65sL1c0uOS\nbomI1z5nuZRbeM4mGz2ZzyaLiJ5vrq+j6BFxQtLzkjac7VAAhqfOUfTJasst21+SdI2kg00PBmBw\ndY6iXyzpQdtjWvwP4TcR8WSzYwEooc5R9L9q8Z7gAEYMn2QDEiNwIDECBxIjcCAxAgcSI3AgMQIH\nEiNwIDFuXdSHqamptkdIYe1aPjc1LGzBgcQIHEiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHEagde\nXRv9Vdtcjw0YEf1swW+VNNfUIADKq3tnkxWSrpO0o9lxAJRUdwt+j6TbJX3c4CwACqtz44PrJS1E\nxJ4ey3FvMqBj6mzB10naaPuIpEckrbf90KcXiojtETEdEdOFZwRwlnoGHhF3RsSKiFglabOk5yLi\nhsYnAzAwfg8OJNbXFV0i4gVJLzQyCYDi2IIDiRE4kBiBA4kROJAYgQOJETiQGIEDiRE4kJgjovxK\n7fIr7YAm/q7ORbbbHiGFiOj5F8kWHEiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIrNYlm6or\nqr4v6T+STnHlVGA09HNNtu9ExPHGJgFQHLvoQGJ1Aw9Jv7e9x/ZMkwMBKKfuLvq3I+KY7a9Letb2\nwYjYvXSBKnziBzqk79NFbf9U0gcR8fPPWSbleZWcLloGp4uWUeR0Udtftn3h6T9L+q6k1wYfD0DT\n6uyif0PS49X/ussk/Toinm50KgBFcEWXPrCLXga76GVwRRfgHEfgQGIEDiRG4EBiBA4kRuBAYgQO\nJEbgQGL9nA+OxK688sq2R0AD2IIDiRE4kBiBA4kROJAYgQOJETiQGIEDiRE4kBiBA4nVCtz2ctu7\nbB+0PWf7iqYHAzC4uh9V/YWkpyPiB7bPlzTe4EwACukZuO0JSVdJ2ipJEXFS0slmxwJQQp1d9NWS\n3pL0gO1Xbe+oro8OoOPqBL5M0mWS7o2ItZI+lHTHpxeyPWN71vZs4RkBnKU6gc9Lmo+Il6vHu7QY\n/CdExPaImObe4UB39Aw8It6UdNT2muqpqyUdaHQqAEXUPYp+i6Sd1RH0w5Juam4kAKXUCjwi9kli\n1xsYMXySDUiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIbOTvTbZp06a2R2jMu+++O7TXeuml\nl4b2WhgetuBAYgQOJEbgQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGI9A7e9xva+JV/v2b5tGMMBGEzP\nj6pGxOuSviVJtsckHZP0eMNzASig3130qyX9IyL+2cQwAMrq92STzZIePtM3bM9Imhl4IgDF1N6C\nVzc92Cjpt2f6PrcuArqnn130ayXtjYh/NTUMgLL6CXyL/s/uOYBuqhV4dT/wayQ91uw4AEqqe2+y\nDyV9teFZABTGJ9mAxAgcSIzAgcQIHEiMwIHECBxIjMCBxAgcSMwRUX6l9luS+j2l9GuSjhcfphuy\nvjfeV3u+GRGTvRZqJPCzYXs265loWd8b76v72EUHEiNwILEuBb697QEalPW98b46rjM/gwMor0tb\ncACFdSJw2xtsv277kO072p6nBNsrbT9v+4Dt/bZvbXumkmyP2X7V9pNtz1KS7eW2d9k+aHvO9hVt\nzzSI1nfRq2ut/12LV4yZl/SKpC0RcaDVwQZk+2JJF0fEXtsXStoj6fuj/r5Os/0jSdOSvhIR17c9\nTym2H5T0h4jYUV1odDwiTrQ919nqwhb8ckmHIuJwRJyU9IikTS3PNLCIeCMi9lZ/fl/SnKSpdqcq\nw/YKSddJ2tH2LCXZnpB0laT7JCkiTo5y3FI3Ap+SdHTJ43klCeE026skrZX0cruTFHOPpNslfdz2\nIIWtlvSWpAeqHz92VNcjHFldCDw12xdIelTSbRHxXtvzDMr29ZIWImJP27M0YJmkyyTdGxFrJX0o\naaSPCXUh8GOSVi55vKJ6buTZPk+Lce+MiCxXpF0naaPtI1r8cWq97YfaHamYeUnzEXF6T2uXFoMf\nWV0I/BVJl9heXR3U2CzpiZZnGphta/FnubmIuLvteUqJiDsjYkVErNLiv9VzEXFDy2MVERFvSjpq\ne0311NWSRvqgaL/3JisuIk7ZvlnSM5LGJN0fEftbHquEdZJulPQ32/uq534SEU+1OBN6u0XSzmpj\nc1jSTS3PM5DWf00GoDld2EUH0BACBxIjcCAxAgcSI3AgMQIHEiNwIDECBxL7L6cMeOJdo+HNAAAA\nAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f03995ffac8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 数据可视化， 将向量重新变回图片\n",
    "# 利用matplotlib\n",
    "\n",
    "def showDigit(vector):\n",
    "    # 重新变形\n",
    "    img = vector.reshape((8, 8))\n",
    "    plt.imshow(img, cmap='gray')\n",
    "    plt.show()\n",
    "\n",
    "def cvtImg2Vect(img):\n",
    "    # 将灰度图变为向量\n",
    "    # 变换为 8×8\n",
    "    img = cv2.resize(img, (8,8), interpolation = cv2.INTER_LINEAR)\n",
    "    # 变为向量 并将数值放缩在 0-16之间\n",
    "    return np.reshape(img, (64)) / 16\n",
    "\n",
    "# 读入样例图片\n",
    "demo_digit_img = cv2.imread('./number_bin/7.png', cv2.IMREAD_GRAYSCALE)\n",
    "demo_digit_vect = cvtImg2Vect(demo_digit_img)\n",
    "\n",
    "# 提取好的向量 data\n",
    "print(demo_digit_vect)\n",
    "showDigit(demo_digit_vect)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据标注\n",
    "\n",
    "给每个特征进行分类，并添加标记，标注其对应的分类。必须要一一对应。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 读入数据\n",
    "\n",
    "为了做数字识别你需要做大量的图片，图像预处理，然后标注。\n",
    "好在sklearn中有一个预设的手写数字的数据库，里面有1797个手写数字，所以我们用sklearn中自带的数据库进行模型训练。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 读入手写字体的\n",
    "digits = datasets.load_digits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 载入数据 data\n",
    "data = digits.data\n",
    "# 载入标签 label\n",
    "target = digits.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  0.,   0.,   5., ...,   0.,   0.,   0.],\n",
       "       [  0.,   0.,   0., ...,  10.,   0.,   0.],\n",
       "       [  0.,   0.,   0., ...,  16.,   9.,   0.],\n",
       "       ..., \n",
       "       [  0.,   0.,   1., ...,   6.,   0.,   0.],\n",
       "       [  0.,   0.,   2., ...,  12.,   0.,   0.],\n",
       "       [  0.,   0.,  10., ...,  12.,   1.,   0.]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 打印数据\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  0.,   0.,   5.,  13.,   9.,   1.,   0.,   0.,   0.,   0.,  13.,\n",
       "        15.,  10.,  15.,   5.,   0.,   0.,   3.,  15.,   2.,   0.,  11.,\n",
       "         8.,   0.,   0.,   4.,  12.,   0.,   0.,   8.,   8.,   0.,   0.,\n",
       "         5.,   8.,   0.,   0.,   9.,   8.,   0.,   0.,   4.,  11.,   0.,\n",
       "         1.,  12.,   7.,   0.,   0.,   2.,  14.,   5.,  10.,  12.,   0.,\n",
       "         0.,   0.,   0.,   6.,  13.,  10.,   0.,   0.,   0.])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 打印第一行数据 长度为64\n",
    "data[0] "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPgAAAD8CAYAAABaQGkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACrdJREFUeJzt3V+IXOUZx/Hfr6vSWq2G1hbZDU0iEpBCjQkBSRGaxBKr\naC9qSEChUlhvFKUFjb3rnVdiL4oQolYwVbpRQcRqE1Ss0Fp3Y2xNNpZ0sWQXbSKJRL1oSHx6sScQ\nJXbOZs5558zj9wOL+2fY95nEb87Z2ZnzOiIEIKevDHoAAO0hcCAxAgcSI3AgMQIHEiNwIDECBxIj\ncCAxAgcSO6eNb2o75dPjFi1aVHS90dHRYmsdO3as2Fpzc3PF1jp58mSxtUqLCPe6TSuBZ7V+/fqi\n691///3F1tq1a1extbZs2VJsraNHjxZbq4s4RQcSI3AgMQIHEiNwIDECBxIjcCAxAgcSI3AgsVqB\n295g+x3bB2yXe5YCgL70DNz2iKTfSrpO0hWSNtu+ou3BAPSvzhF8taQDETETEcclPSnppnbHAtCE\nOoGPSjp42sez1ecAdFxjLzaxPS5pvKnvB6B/dQKfk7T4tI/Hqs99RkRslbRVyvtyUWDY1DlFf0PS\n5baX2j5P0iZJz7Y7FoAm9DyCR8QJ23dIelHSiKRHImJv65MB6Futn8Ej4nlJz7c8C4CG8Uw2IDEC\nBxIjcCAxAgcSI3AgMQIHEiNwIDECBxJjZ5MFKLnTiCQtW7as2Folt2U6cuRIsbU2btxYbC1JmpiY\nKLpeLxzBgcQIHEiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHE6uxs8ojtQ7bfLjEQgObUOYL/TtKG\nlucA0IKegUfEq5LKPXkYQGP4GRxIjK2LgMQaC5yti4Du4RQdSKzOr8mekPQXScttz9r+eftjAWhC\nnb3JNpcYBEDzOEUHEiNwIDECBxIjcCAxAgcSI3AgMQIHEiNwILGh37po5cqVxdYquZWQJF122WXF\n1pqZmSm21s6dO4utVfL/D4mtiwAUROBAYgQOJEbgQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGJ1Lrq4\n2PbLtvfZ3mv7rhKDAehfneein5D0y4jYbftCSVO2d0bEvpZnA9CnOnuTvRcRu6v3P5I0LWm07cEA\n9G9BryazvUTSCkmvn+FrbF0EdEztwG1fIOkpSXdHxLHPf52ti4DuqfUouu1zNR/39oh4ut2RADSl\nzqPolvSwpOmIeKD9kQA0pc4RfI2kWyWttb2nevtxy3MBaECdvclek+QCswBoGM9kAxIjcCAxAgcS\nI3AgMQIHEiNwIDECBxIjcCCxod+bbNGiRcXWmpqaKraWVHa/sJJK/zl+mXEEBxIjcCAxAgcSI3Ag\nMQIHEiNwIDECBxIjcCAxAgcSq3PRxa/a/pvtt6qti35dYjAA/avzVNX/SlobER9Xl09+zfYfI+Kv\nLc8GoE91LroYkj6uPjy3emNjA2AI1N34YMT2HkmHJO2MiDNuXWR70vZk00MCODu1Ao+IkxFxpaQx\nSattf+8Mt9kaEasiYlXTQwI4Owt6FD0iPpT0sqQN7YwDoEl1HkW/xPbF1ftfk3StpP1tDwagf3Ue\nRb9U0mO2RzT/D8IfIuK5dscC0IQ6j6L/XfN7ggMYMjyTDUiMwIHECBxIjMCBxAgcSIzAgcQIHEiM\nwIHE2LpoAXbt2lVsrcxK/p0dPXq02FpdxBEcSIzAgcQIHEiMwIHECBxIjMCBxAgcSIzAgcQIHEis\nduDVtdHftM312IAhsZAj+F2SptsaBEDz6u5sMibpeknb2h0HQJPqHsEflHSPpE9bnAVAw+psfHCD\npEMRMdXjduxNBnRMnSP4Gkk32n5X0pOS1tp+/PM3Ym8yoHt6Bh4R90XEWEQskbRJ0ksRcUvrkwHo\nG78HBxJb0BVdIuIVSa+0MgmAxnEEBxIjcCAxAgcSI3AgMQIHEiNwIDECBxIjcCCxod+6qOTWNCtX\nriy2VmkltxMq+ec4MTFRbK0u4ggOJEbgQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGIEDiRW65ls1RVV\nP5J0UtIJrpwKDIeFPFX1hxHxQWuTAGgcp+hAYnUDD0l/sj1le7zNgQA0p+4p+g8iYs72tyXttL0/\nIl49/QZV+MQPdEitI3hEzFX/PSTpGUmrz3Abti4COqbO5oNft33hqfcl/UjS220PBqB/dU7RvyPp\nGdunbv/7iHih1akANKJn4BExI+n7BWYB0DB+TQYkRuBAYgQOJEbgQGIEDiRG4EBiBA4kRuBAYo6I\n5r+p3fw3/QLLli0rtZQmJyeLrSVJt99+e7G1br755mJrlfw7W7Uq70sjIsK9bsMRHEiMwIHECBxI\njMCBxAgcSIzAgcQIHEiMwIHECBxIrFbgti+2vcP2ftvTtq9uezAA/at7XfTfSHohIn5q+zxJ57c4\nE4CG9Azc9kWSrpH0M0mKiOOSjrc7FoAm1DlFXyrpsKRHbb9pe1t1fXQAHVcn8HMkXSXpoYhYIekT\nSVs+fyPb47YnbZd9yRWAL1Qn8FlJsxHxevXxDs0H/xlsXQR0T8/AI+J9SQdtL68+tU7SvlanAtCI\nuo+i3ylpe/UI+oyk29obCUBTagUeEXskceoNDBmeyQYkRuBAYgQOJEbgQGIEDiRG4EBiBA4kRuBA\nYgQOJDb0e5OVND4+XnS9e++9t9haU1NTxdbauHFjsbUyY28y4EuOwIHECBxIjMCBxAgcSIzAgcQI\nHEiMwIHECBxIrGfgtpfb3nPa2zHbd5cYDkB/el50MSLekXSlJNkekTQn6ZmW5wLQgIWeoq+T9K+I\n+HcbwwBoVt3rop+ySdITZ/qC7XFJZV+NAeD/qn0ErzY9uFHSxJm+ztZFQPcs5BT9Okm7I+I/bQ0D\noFkLCXyzvuD0HEA31Qq82g/8WklPtzsOgCbV3ZvsE0nfbHkWAA3jmWxAYgQOJEbgQGIEDiRG4EBi\nBA4kRuBAYgQOJNbW1kWHJS30JaXfkvRB48N0Q9b7xv0anO9GxCW9btRK4GfD9mTWV6JlvW/cr+7j\nFB1IjMCBxLoU+NZBD9CirPeN+9VxnfkZHEDzunQEB9CwTgRue4Ptd2wfsL1l0PM0wfZi2y/b3md7\nr+27Bj1Tk2yP2H7T9nODnqVJti+2vcP2ftvTtq8e9Ez9GPgpenWt9X9q/ooxs5LekLQ5IvYNdLA+\n2b5U0qURsdv2hZKmJP1k2O/XKbZ/IWmVpG9ExA2Dnqcpth+T9OeI2FZdaPT8iPhw0HOdrS4cwVdL\nOhARMxFxXNKTkm4a8Ex9i4j3ImJ39f5HkqYljQ52qmbYHpN0vaRtg56lSbYvknSNpIclKSKOD3Pc\nUjcCH5V08LSPZ5UkhFNsL5G0QtLrg52kMQ9KukfSp4MepGFLJR2W9Gj148e26nqEQ6sLgadm+wJJ\nT0m6OyKODXqeftm+QdKhiJga9CwtOEfSVZIeiogVkj6RNNSPCXUh8DlJi0/7eKz63NCzfa7m494e\nEVmuSLtG0o2239X8j1NrbT8+2JEaMytpNiJOnWnt0HzwQ6sLgb8h6XLbS6sHNTZJenbAM/XNtjX/\ns9x0RDww6HmaEhH3RcRYRCzR/N/VSxFxy4DHakREvC/poO3l1afWSRrqB0UXujdZ4yLihO07JL0o\naUTSIxGxd8BjNWGNpFsl/cP2nupzv4qI5wc4E3q7U9L26mAzI+m2Ac/Tl4H/mgxAe7pwig6gJQQO\nJEbgQGIEDiRG4EBiBA4kRuBAYgQOJPY/qbaNczQ1iIEAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f03412389b0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "showDigit(data[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, ..., 8, 9, 8])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 打印标签 label\n",
    "target"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 分割数据集\n",
    "\n",
    "将数据分为训练集跟测试集。训练集用于训练SVM模型,测试集用于检验SVM模型的准确率。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1797\n"
     ]
    }
   ],
   "source": [
    "# 查看一共有多少个sample\n",
    "n_digit = len(data)\n",
    "print(n_digit)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对原来的数据进行重排序， 随机打乱"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([   0,    1,    2, ..., 1794, 1795, 1796])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "idxs = np.array(range(n_digit))\n",
    "idxs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1690, 1667,  346, ...,  222, 1066,  374])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 利用shuffle 对索引序列进行重新排序\n",
    "np.random.shuffle(idxs)\n",
    "idxs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 利用numpy的索引进行重新排序\n",
    "data = data[idxs]\n",
    "target = target[idxs]\n",
    "\n",
    "# 分割训练集跟测试集\n",
    "# 获取训练集\n",
    "n_train = int(n_digit*0.80) # 取前80%作为训练集\n",
    "train_digit = data[:n_train]\n",
    "train_label = target[:n_train]\n",
    "\n",
    "# 获取测试集\n",
    "n_test = n_digit - n_train # 测试样本的个数\n",
    "test_digit = data[n_train:] # 测试样本的特征\n",
    "test_label = target[n_train:] # 测试样本的标签"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 选择机器学习模型\n",
    "\n",
    "根据经验选择机器学习模型， 这里我们选择的是SVM 支持向量机\n",
    "目前为止，你不需要知道SVM的具体算法，只需要将其当作黑箱，喂养给这个模型很多好吃的（数据）， 他就会越来越聪明，提高分类的准确率。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建一个分类器，类型为SVM 支持向量机\n",
    "# 并定义超参数 gamma=0.001\n",
    "classifier = svm.SVC(gamma=0.001)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n",
       "  decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',\n",
       "  max_iter=-1, probability=False, random_state=None, shrinking=True,\n",
       "  tol=0.001, verbose=False)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 开始fit\n",
    "# 输入训练集， 调参\n",
    "classifier.fit(train_digit, train_label)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型预测\n",
    "\n",
    "将样本输入训练好的模型， 模型会给出自己的预测结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取样本输入\n",
    "sample_digit_vect = test_digit[0]\n",
    "# 获取真实的标签 （已知）\n",
    "sample_digit_label = test_label[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPgAAAD8CAYAAABaQGkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACrBJREFUeJzt3d+LHfUZx/HPp6vS2NgutLZINmZzIQEpJJEQkBSxEUus\nonvRiwQUKoVcKcYWRHtl/wHZXhQhRE3AVGmjBhGrFXS1QmtN4rY12VjSoGSDNsYajV40RJ9e7ASi\npD1zcr4zZ87T9wsW98dhv88hvJ3Zs7PzdUQIQE5fGfYAAJpD4EBiBA4kRuBAYgQOJEbgQGIEDiRG\n4EBiBA4kdkET39R2ysvjFi9e3Op6l19+eWtrLVq0qLW1Mvvggw9aWef48eM6efKkez2ukcCzWrNm\nTavrTU9Pt7bWypUrW1srsx07drSyzv3331/rcZyiA4kROJAYgQOJETiQGIEDiRE4kBiBA4kROJBY\nrcBtb7D9lu1Dtu9teigAZfQM3PaYpF9JukHSlZI22b6y6cEADK7OEXytpEMRcTgiTkl6XNItzY4F\noIQ6gS+RdOSsj+erzwHouGJ/bGJ7s6TNpb4fgMHVCfyopKVnfTxRfe4LImKrpK1S3j8XBUZNnVP0\n1yVdYXu57YskbZT0dLNjASih5xE8Ik7bvkPS85LGJD0cEfsbnwzAwGr9DB4Rz0p6tuFZABTGlWxA\nYgQOJEbgQGIEDiRG4EBiBA4kRuBAYgQOJMbOJn3YsmVLq+tNTk62ttbLL7/c2lrj4+OtrdX2ji27\nd+9uZZ0TJ07UehxHcCAxAgcSI3AgMQIHEiNwIDECBxIjcCAxAgcSI3AgsTo7mzxs+5jtN9sYCEA5\ndY7g2yVtaHgOAA3oGXhEvCLpXy3MAqAwfgYHEmPrIiCxYoGzdRHQPZyiA4nV+TXZY5L+KGmF7Xnb\nP2l+LAAl1NmbbFMbgwAoj1N0IDECBxIjcCAxAgcSI3AgMQIHEiNwIDECBxJzRPnLxrkWHf/LzMxM\na2vV3eKnlKmpqdbWigj3egxHcCAxAgcSI3AgMQIHEiNwIDECBxIjcCAxAgcSI3AgMQIHEqtz08Wl\ntl+yfcD2ftt3tTEYgMHVuS/6aUk/i4h9ti+RtNf2CxFxoOHZAAyozt5k70bEvur9k5LmJC1pejAA\ng+trZxPbk5JWS3rtHF9j6yKgY2oHbnuxpCckbYmIj7/8dbYuArqn1qvoti/UQtw7I+LJZkcCUEqd\nV9Et6SFJcxHxQPMjASilzhF8naTbJK23PVu9/bDhuQAUUGdvslcl9bw1DIDu4Uo2IDECBxIjcCAx\nAgcSI3AgMQIHEiNwIDECBxLr66/J/t+Nj4+3ul6b+2q1uafWqlWrUq7VRRzBgcQIHEiMwIHECBxI\njMCBxAgcSIzAgcQIHEiMwIHE6tx08au2/2z7L9XWRb9oYzAAg6tzqeq/Ja2PiE+q2ye/avt3EfGn\nhmcDMKA6N10MSZ9UH15YvbGxATAC6m58MGZ7VtIxSS9ExDm3LrK9x/ae0kMCOD+1Ao+IzyJilaQJ\nSWttf/ccj9kaEWsiYk3pIQGcn75eRY+IE5JekrShmXEAlFTnVfRLbY9X7y+SdL2kg00PBmBwdV5F\nv0zSDttjWvgfwm8i4plmxwJQQp1X0f+qhT3BAYwYrmQDEiNwIDECBxIjcCAxAgcSI3AgMQIHEiNw\nIDG2LurD7Oxsq+u1uXXR5ORka2u16dprr211ve3bt7e6Xi8cwYHECBxIjMCBxAgcSIzAgcQIHEiM\nwIHECBxIjMCBxGoHXt0b/Q3b3I8NGBH9HMHvkjTX1CAAyqu7s8mEpBslbWt2HAAl1T2CT0u6R9Ln\nDc4CoLA6Gx/cJOlYROzt8Tj2JgM6ps4RfJ2km22/LelxSettP/rlB7E3GdA9PQOPiPsiYiIiJiVt\nlPRiRNza+GQABsbvwYHE+rqjS0TMSJppZBIAxXEEBxIjcCAxAgcSI3AgMQIHEiNwIDECBxIjcCAx\nti7qw/j4eKvrLVu2rLW1Pvroo9bWanN7n5mZmdbW6iKO4EBiBA4kRuBAYgQOJEbgQGIEDiRG4EBi\nBA4kRuBAYrWuZKvuqHpS0meSTnPnVGA09HOp6vcj4nhjkwAojlN0ILG6gYek39vea3tzkwMBKKfu\nKfr3IuKo7W9LesH2wYh45ewHVOETP9AhtY7gEXG0+u8xSU9JWnuOx7B1EdAxdTYf/JrtS868L+kH\nkt5sejAAg6tziv4dSU/ZPvP4X0fEc41OBaCInoFHxGFJK1uYBUBh/JoMSIzAgcQIHEiMwIHECBxI\njMCBxAgcSIzAgcQcEeW/qV3+m3ZA21sXffjhh62tdffdd7e21vT0dGtrZRYR7vUYjuBAYgQOJEbg\nQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGK1Arc9bnuX7YO252xf3fRgAAZX977ov5T0XET8yPZFki5u\ncCYAhfQM3PY3JF0j6ceSFBGnJJ1qdiwAJdQ5RV8u6X1Jj9h+w/a26v7oADquTuAXSLpK0oMRsVrS\np5Lu/fKDbG+2vcf2nsIzAjhPdQKflzQfEa9VH+/SQvBfwNZFQPf0DDwi3pN0xPaK6lPXSTrQ6FQA\niqj7KvqdknZWr6AflnR7cyMBKKVW4BExK4lTb2DEcCUbkBiBA4kROJAYgQOJETiQGIEDiRE4kBiB\nA4kROJBY3UtVIWlqaqrV9d55553W1tq9e3dra6E9HMGBxAgcSIzAgcQIHEiMwIHECBxIjMCBxAgc\nSIzAgcR6Bm57he3Zs94+tr2ljeEADKbnpaoR8ZakVZJke0zSUUlPNTwXgAL6PUW/TtI/IqK9i6QB\nnLd+/9hko6THzvUF25slbR54IgDF1D6CV5se3Czpt+f6OlsXAd3Tzyn6DZL2RcQ/mxoGQFn9BL5J\n/+X0HEA31Qq82g/8eklPNjsOgJLq7k32qaRvNjwLgMK4kg1IjMCBxAgcSIzAgcQIHEiMwIHECBxI\njMCBxBwR5b+p/b6kfv+k9FuSjhcfphuyPjee1/Asi4hLez2okcDPh+09Wf8SLetz43l1H6foQGIE\nDiTWpcC3DnuABmV9bjyvjuvMz+AAyuvSERxAYZ0I3PYG22/ZPmT73mHPU4LtpbZfsn3A9n7bdw17\nppJsj9l+w/Yzw56lJNvjtnfZPmh7zvbVw55pEEM/Ra/utf53LdwxZl7S65I2RcSBoQ42INuXSbos\nIvbZvkTSXklTo/68zrD9U0lrJH09Im4a9jyl2N4h6Q8Rsa260ejFEXFi2HOdry4cwddKOhQRhyPi\nlKTHJd0y5JkGFhHvRsS+6v2TkuYkLRnuVGXYnpB0o6Rtw56lJNvfkHSNpIckKSJOjXLcUjcCXyLp\nyFkfzytJCGfYnpS0WtJrw52kmGlJ90j6fNiDFLZc0vuSHql+/NhW3Y9wZHUh8NRsL5b0hKQtEfHx\nsOcZlO2bJB2LiL3DnqUBF0i6StKDEbFa0qeSRvo1oS4EflTS0rM+nqg+N/JsX6iFuHdGRJY70q6T\ndLPtt7Xw49R6248Od6Ri5iXNR8SZM61dWgh+ZHUh8NclXWF7efWixkZJTw95poHZthZ+lpuLiAeG\nPU8pEXFfRExExKQW/q1ejIhbhzxWERHxnqQjtldUn7pO0ki/KNrv3mTFRcRp23dIel7SmKSHI2L/\nkMcqYZ2k2yT9zfZs9bmfR8SzQ5wJvd0paWd1sDks6fYhzzOQof+aDEBzunCKDqAhBA4kRuBAYgQO\nJEbgQGIEDiRG4EBiBA4k9h8Qt4flz2zocAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f03413f6208>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "showDigit(sample_digit_vect)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "正确标签 ： 7\n"
     ]
    }
   ],
   "source": [
    "print(\"正确标签 ： {}\".format(sample_digit_label))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([7])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 利用模型预测\n",
    "classifier.predict([sample_digit_vect])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample_digit_predict = classifier.predict([sample_digit_vect])[0]\n",
    "# 判断是否准确\n",
    "sample_digit_predict == sample_digit_label"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**预测我们自己预处理的样本数据**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPgAAAD8CAYAAABaQGkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACpRJREFUeJzt3V+IHfUZxvHn6UZpt9oNttsim9DkQgJeNbIIkio0YolV\nkl70IgGFSGFvoigtiPaqvfKmiL0oQohawVRpo4KI1QoqqdJaNzFtTTaWNKRkg3YNGv9CQ+rbi52U\nVVPPnJzfnJnz5vuBhZyzw5z3JHwzs7NnZhwRApDTF9oeAEBzCBxIjMCBxAgcSIzAgcQIHEiMwIHE\nCBxIjMCBxJY1sdLx8fGYmJhoYtWfMTU1NZTXAbrkyJEjOn78uHst10jgExMT2rp1axOr/oy77rpr\nKK8DdMn09HSt5dhFBxIjcCAxAgcSI3AgMQIHEiNwIDECBxIjcCCxWoHb3mD7dduHbN/R9FAAyugZ\nuO0xSb+UdK2kSyVtsX1p04MBGFydLfjlkg5FxOGIOCnpEUmbmh0LQAl1Ap+SdHTJ4/nqOQAdV+wg\nm+0Z27O2Zz/66KNSqwUwgDqBH5O0csnjFdVznxAR2yNiOiKmx8fHS80HYAB1An9F0iW2V9s+X9Jm\nSU80OxaAEnqeDx4Rp2zfLOkZSWOS7o+I/Y1PBmBgtS74EBFPSXqq4VkAFMYn2YDECBxIjMCBxAgc\nSIzAgcQIHEiMwIHECBxIrJE7m0xOTmrbtm1NrBpAH9iCA4kROJAYgQOJETiQGIEDiRE4kBiBA4kR\nOJAYgQOJ1bmzyf22F2y/NoyBAJRTZwv+K0kbGp4DQAN6Bh4RuyW9PYRZABTGz+BAYo3cuujtt9ng\nA11QLPClty666KKLSq0WwADYRQcSq/Nrsocl/VHSGtvztn/Y/FgASqhzb7ItwxgEQHnsogOJETiQ\nGIEDiRE4kBiBA4kROJAYgQOJETiQmCOi/Ert8is9By1fvnxor/XOO+8M7bUy27hx41BeZ/fu3Tpx\n4oR7LccWHEiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIjMCBxOpcdHGl7edtH7C93/atwxgM\nwOB6XnRR0ilJP46IvbYvlLTH9rMRcaDh2QAMqM69yd6IiL3Vn9+XNCdpqunBAAyuzhb8f2yvkrRW\n0stn+N6MpJkiUwEoonbgti+Q9Kik2yLivU9/PyK2S9peLcvpokAH1DqKbvs8Lca9MyIea3YkAKXU\nOYpuSfdJmouIu5sfCUApdbbg6yTdKGm97X3V1/cangtAAXXuTfaipJ6XhgHQPXySDUiMwIHECBxI\njMCBxAgcSIzAgcQIHEiMwIHE+jqb7Fw3OTk51NdbWFgY6utltPhJ63MXW3AgMQIHEiNwIDECBxIj\ncCAxAgcSI3AgMQIHEiNwILE6F138ou0/2/5Ldeuinw1jMACDq/NR1X9LWh8RH1SXT37R9u8i4k8N\nzwZgQHUuuhiSPqgenld9cWMDYATUvfHBmO19khYkPRsRZ7x1ke1Z27OlhwRwdry4ga65sL1c0uOS\nbomI1z5nuZRbeM4mGz2ZzyaLiJ5vrq+j6BFxQtLzkjac7VAAhqfOUfTJasst21+SdI2kg00PBmBw\ndY6iXyzpQdtjWvwP4TcR8WSzYwEooc5R9L9q8Z7gAEYMn2QDEiNwIDECBxIjcCAxAgcSI3AgMQIH\nEiNwIDFuXdSHqamptkdIYe1aPjc1LGzBgcQIHEiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHEagde\nXRv9Vdtcjw0YEf1swW+VNNfUIADKq3tnkxWSrpO0o9lxAJRUdwt+j6TbJX3c4CwACqtz44PrJS1E\nxJ4ey3FvMqBj6mzB10naaPuIpEckrbf90KcXiojtETEdEdOFZwRwlnoGHhF3RsSKiFglabOk5yLi\nhsYnAzAwfg8OJNbXFV0i4gVJLzQyCYDi2IIDiRE4kBiBA4kROJAYgQOJETiQGIEDiRE4kJgjovxK\n7fIr7YAm/q7ORbbbHiGFiOj5F8kWHEiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIrNYlm6or\nqr4v6T+STnHlVGA09HNNtu9ExPHGJgFQHLvoQGJ1Aw9Jv7e9x/ZMkwMBKKfuLvq3I+KY7a9Letb2\nwYjYvXSBKnziBzqk79NFbf9U0gcR8fPPWSbleZWcLloGp4uWUeR0Udtftn3h6T9L+q6k1wYfD0DT\n6uyif0PS49X/ussk/Toinm50KgBFcEWXPrCLXga76GVwRRfgHEfgQGIEDiRG4EBiBA4kRuBAYgQO\nJEbgQGL9nA+OxK688sq2R0AD2IIDiRE4kBiBA4kROJAYgQOJETiQGIEDiRE4kBiBA4nVCtz2ctu7\nbB+0PWf7iqYHAzC4uh9V/YWkpyPiB7bPlzTe4EwACukZuO0JSVdJ2ipJEXFS0slmxwJQQp1d9NWS\n3pL0gO1Xbe+oro8OoOPqBL5M0mWS7o2ItZI+lHTHpxeyPWN71vZs4RkBnKU6gc9Lmo+Il6vHu7QY\n/CdExPaImObe4UB39Aw8It6UdNT2muqpqyUdaHQqAEXUPYp+i6Sd1RH0w5Juam4kAKXUCjwi9kli\n1xsYMXySDUiMwIHECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIbOTvTbZp06a2R2jMu+++O7TXeuml\nl4b2WhgetuBAYgQOJEbgQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGI9A7e9xva+JV/v2b5tGMMBGEzP\nj6pGxOuSviVJtsckHZP0eMNzASig3130qyX9IyL+2cQwAMrq92STzZIePtM3bM9Imhl4IgDF1N6C\nVzc92Cjpt2f6PrcuArqnn130ayXtjYh/NTUMgLL6CXyL/s/uOYBuqhV4dT/wayQ91uw4AEqqe2+y\nDyV9teFZABTGJ9mAxAgcSIzAgcQIHEiMwIHECBxIjMCBxAgcSMwRUX6l9luS+j2l9GuSjhcfphuy\nvjfeV3u+GRGTvRZqJPCzYXs265loWd8b76v72EUHEiNwILEuBb697QEalPW98b46rjM/gwMor0tb\ncACFdSJw2xtsv277kO072p6nBNsrbT9v+4Dt/bZvbXumkmyP2X7V9pNtz1KS7eW2d9k+aHvO9hVt\nzzSI1nfRq2ut/12LV4yZl/SKpC0RcaDVwQZk+2JJF0fEXtsXStoj6fuj/r5Os/0jSdOSvhIR17c9\nTym2H5T0h4jYUV1odDwiTrQ919nqwhb8ckmHIuJwRJyU9IikTS3PNLCIeCMi9lZ/fl/SnKSpdqcq\nw/YKSddJ2tH2LCXZnpB0laT7JCkiTo5y3FI3Ap+SdHTJ43klCeE026skrZX0cruTFHOPpNslfdz2\nIIWtlvSWpAeqHz92VNcjHFldCDw12xdIelTSbRHxXtvzDMr29ZIWImJP27M0YJmkyyTdGxFrJX0o\naaSPCXUh8GOSVi55vKJ6buTZPk+Lce+MiCxXpF0naaPtI1r8cWq97YfaHamYeUnzEXF6T2uXFoMf\nWV0I/BVJl9heXR3U2CzpiZZnGphta/FnubmIuLvteUqJiDsjYkVErNLiv9VzEXFDy2MVERFvSjpq\ne0311NWSRvqgaL/3JisuIk7ZvlnSM5LGJN0fEftbHquEdZJulPQ32/uq534SEU+1OBN6u0XSzmpj\nc1jSTS3PM5DWf00GoDld2EUH0BACBxIjcCAxAgcSI3AgMQIHEiNwIDECBxL7L6cMeOJdo+HNAAAA\nAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f03415b3128>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "showDigit(demo_digit_vect)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([7])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 预测我们自己预处理的样本数据\n",
    "classifier.predict([demo_digit_vect])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型评估"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型评估就是用来说明一个机器学习模型效果的好坏，准不准，需要借助某种指标或者说是度量．\n",
    "\n",
    "最简单的指标莫过于**准确率 Accuracy**了，　准确率＝成功匹配的个数 / 测试集总个数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([7, 7, 7, 3, 2, 0, 4, 5, 8, 1])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将训练集传入classifier\n",
    "test_predict = classifier.predict(test_digit)\n",
    "# 打印预测结果\n",
    "test_predict[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将真实的标签与预测标签对比， 获取布尔值\n",
    "result = test_predict == test_label\n",
    "result[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "356"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 统计为真的个数\n",
    "n_match = np.count_nonzero(result)\n",
    "n_match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy 准确率为：0.9889\n"
     ]
    }
   ],
   "source": [
    "# 获取准确率  成功匹配的个数 / 测试集总个数\n",
    "accuracy = n_match / n_test\n",
    "\n",
    "print(\"Accuracy 准确率为：%.4f\"%accuracy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PS 不过，大家千万也别特别痴迷在准确率上，因为这样容易过拟合．\n",
    "在后续的教程里，阿凯会举例说明**过拟合**跟**欠拟合**的概念．"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型持久化\n",
    "训练好的模型，为了方便我们下次再次调用，需要将**分类器　classifier**对象持久化为二进制数据．\n",
    "下次使用的时候，直接导入模型，而不至于再重新训练模型．\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "\n",
    "# 将分类器 classfier 序列化为二进制数据\n",
    "clf_bin = pickle.dumps(classifier)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将二进制数据 存入文件中\n",
    "with open('digit_clf.bin','bw') as f:\n",
    "    f.write(clf_bin)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 从文件中重新读入序列化的二进制数据\n",
    "clf_bin2 = None\n",
    "with open('digit_clf.bin','br') as f:\n",
    "    clf_bin2 = f.read()\n",
    "# 将二进制数据还原为分类器对象\n",
    "classifier2 = pickle.loads(clf_bin2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPgAAAD8CAYAAABaQGkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACshJREFUeJzt3d2LXeUZhvH7bjQ0VmugtUWS0AkoASnMRCQgKWIjllhF\ne9CDBBQihRwpCS2I9ij9ByQ9KEKI2oCp0kYNIlYr6GCF1prESWu+ShqmZoI2SgkakYbo04NZKVFS\n9prsd717zeP1g+B8bOZ9duTKWrNnzXodEQKQ01dGPQCA7hA4kBiBA4kROJAYgQOJETiQGIEDiRE4\nkBiBA4ld0sUXtZ3y8riFCxdWXe+aa66pttaiRYuqrfXJJ59UW+udd96ptpYknT59utpaEeFBj3EX\nl6pmDXxsbKzqert376621vj4eLW19u/fX22tzZs3V1tLkiYnJ6ut1SZwTtGBxAgcSIzAgcQIHEiM\nwIHECBxIjMCBxAgcSKxV4LbX2j5i+6jtB7seCkAZAwO3vUDSryTdJuk6SettX9f1YACG1+YIvkrS\n0Yg4FhFnJD0l6a5uxwJQQpvAl0g6ft77M83HAPRcsd8ms71R0sZSXw/A8NoEfkLSsvPeX9p87HMi\nYpukbVLe3yYD5ps2p+hvSrrW9nLbCyWtk/Rct2MBKGHgETwiztq+T9JLkhZIeiwiDnQ+GYChtfoe\nPCJekPRCx7MAKIwr2YDECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIrJOti7KqvUvG9PR0tbVq7six\nadOmamtNTExUW0uq+/fYBkdwIDECBxIjcCAxAgcSI3AgMQIHEiNwIDECBxIjcCCxNjubPGb7pO23\nawwEoJw2R/BfS1rb8RwAOjAw8Ih4TdK/K8wCoDC+BwcSY+siILFigbN1EdA/nKIDibX5MdmTkv4k\naYXtGds/6X4sACW02ZtsfY1BAJTHKTqQGIEDiRE4kBiBA4kROJAYgQOJETiQGIEDiTmi/GXjNa9F\nHxsbq7WUpqamqq0l1d92p5aaf4+1/w5rbjcVER70GI7gQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGIE\nDiRG4EBiBA4k1uami8tsv2r7oO0DtjfVGAzA8NrcF/2spJ9FxD7bV0jaa/vliDjY8WwAhtRmb7J3\nI2Jf8/ZHkg5JWtL1YACGN6edTWyPSVop6Y0LfI6ti4CeaR247cslPS1pc0R8+MXPs3UR0D+tXkW3\nfalm494ZEc90OxKAUtq8im5Jj0o6FBEPdz8SgFLaHMFXS7pH0hrbU82fH3Y8F4AC2uxN9rqkgbeG\nAdA/XMkGJEbgQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGJz+m2yPlq8eHG1tW6++eZqa0l197nasmVL\ntbVqPq8vO47gQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGIEDiRG4EBibW66+FXbf7G9v9m66Bc1BgMw\nvDaXqv5H0pqION3cPvl127+PiD93PBuAIbW56WJIOt28e2nzh40NgHmg7cYHC2xPSTop6eWIuODW\nRbb32N5TekgAF6dV4BHxaURMSFoqaZXt717gMdsi4oaIuKH0kAAuzpxeRY+IU5JelbS2m3EAlNTm\nVfSrbC9u3l4k6VZJh7seDMDw2ryKfrWkHbYXaPYfhN9GxPPdjgWghDavov9Vs3uCA5hnuJINSIzA\ngcQIHEiMwIHECBxIjMCBxAgcSIzAgcQ8+9ughb+onfLXSWtukyRJk5OT1dYaHx+vtlZNO3bsqLre\nhg0bqq0VER70GI7gQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGIEDiRG4EBirQNv7o3+lm3uxwbME3M5\ngm+SdKirQQCU13Znk6WSbpe0vdtxAJTU9gi+VdIDkj7rcBYAhbXZ+OAOSScjYu+Ax7E3GdAzbY7g\nqyXdaXta0lOS1th+4osPYm8yoH8GBh4RD0XE0ogYk7RO0isRcXfnkwEYGj8HBxJrszfZ/0TEpKTJ\nTiYBUBxHcCAxAgcSI3AgMQIHEiNwIDECBxIjcCAxAgcSm9OFLl92p06dqrre9PR01fVq2bJlS7W1\nam7/1EccwYHECBxIjMCBxAgcSIzAgcQIHEiMwIHECBxIjMCBxFpdydbcUfUjSZ9KOsudU4H5YS6X\nqn4/Ij7obBIAxXGKDiTWNvCQ9Afbe21v7HIgAOW0PUX/XkScsP0tSS/bPhwRr53/gCZ84gd6pNUR\nPCJONP89KelZSasu8Bi2LgJ6ps3mg1+zfcW5tyX9QNLbXQ8GYHhtTtG/LelZ2+ce/5uIeLHTqQAU\nMTDwiDgmabzCLAAK48dkQGIEDiRG4EBiBA4kRuBAYgQOJEbgQGIEDiTG1kWQJG3durXaWrt37662\n1pcdR3AgMQIHEiNwIDECBxIjcCAxAgcSI3AgMQIHEiNwILFWgdtebHuX7cO2D9m+sevBAAyv7aWq\nv5T0YkT82PZCSZd1OBOAQgYGbvtKSTdJ2iBJEXFG0pluxwJQQptT9OWS3pf0uO23bG9v7o8OoOfa\nBH6JpOslPRIRKyV9LOnBLz7I9kbbe2zvKTwjgIvUJvAZSTMR8Ubz/i7NBv85bF0E9M/AwCPiPUnH\nba9oPnSLpIOdTgWgiLavot8vaWfzCvoxSfd2NxKAUloFHhFTkjj1BuYZrmQDEiNwIDECBxIjcCAx\nAgcSI3AgMQIHEiNwIDECBxJzRJT/onb5L9oDExMTVdebnJystlbN5zY9PV1trcwiwoMewxEcSIzA\ngcQIHEiMwIHECBxIjMCBxAgcSIzAgcQIHEhsYOC2V9ieOu/Ph7Y31xgOwHAG3nQxIo5ImpAk2wsk\nnZD0bMdzAShgrqfot0j6R0T8s4thAJTV9r7o56yT9OSFPmF7o6SNQ08EoJjWR/Bm04M7Jf3uQp9n\n6yKgf+Zyin6bpH0R8a+uhgFQ1lwCX6//c3oOoJ9aBd7sB36rpGe6HQdASW33JvtY0jc6ngVAYVzJ\nBiRG4EBiBA4kRuBAYgQOJEbgQGIEDiRG4EBiXW1d9L6kuf5K6TclfVB8mH7I+tx4XqPznYi4atCD\nOgn8Ytjek/U30bI+N55X/3GKDiRG4EBifQp826gH6FDW58bz6rnefA8OoLw+HcEBFNaLwG2vtX3E\n9lHbD456nhJsL7P9qu2Dtg/Y3jTqmUqyvcD2W7afH/UsJdlebHuX7cO2D9m+cdQzDWPkp+jNvdb/\nrtk7xsxIelPS+og4ONLBhmT7aklXR8Q+21dI2ivpR/P9eZ1j+6eSbpD09Yi4Y9TzlGJ7h6Q/RsT2\n5kajl0XEqVHPdbH6cARfJeloRByLiDOSnpJ014hnGlpEvBsR+5q3P5J0SNKS0U5Vhu2lkm6XtH3U\ns5Rk+0pJN0l6VJIi4sx8jlvqR+BLJB0/7/0ZJQnhHNtjklZKemO0kxSzVdIDkj4b9SCFLZf0vqTH\nm28/tjf3I5y3+hB4arYvl/S0pM0R8eGo5xmW7TsknYyIvaOepQOXSLpe0iMRsVLSx5Lm9WtCfQj8\nhKRl572/tPnYvGf7Us3GvTMistyRdrWkO21Pa/bbqTW2nxjtSMXMSJqJiHNnWrs0G/y81YfA35R0\nre3lzYsa6yQ9N+KZhmbbmv1e7lBEPDzqeUqJiIciYmlEjGn2/9UrEXH3iMcqIiLek3Tc9ormQ7dI\nmtcvis51b7LiIuKs7fskvSRpgaTHIuLAiMcqYbWkeyT9zfZU87GfR8QLI5wJg90vaWdzsDkm6d4R\nzzOUkf+YDEB3+nCKDqAjBA4kRuBAYgQOJEbgQGIEDiRG4EBiBA4k9l+NuZ+Hhs8BdgAAAABJRU5E\nrkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f0341562f28>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 测试用分类器重新进行分类\n",
    "\n",
    "dight_vect = test_digit[1]\n",
    "showDigit(dight_vect)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([7])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 预测样本\n",
    "classifier2.predict([dight_vect])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
